Quantifying optimal accuracy of local primary sequence bioinformatics methods

نویسندگان

  • Daniel P. Aalberts
  • Eric G. Daub
  • Jesse W. Dill
چکیده

MOTIVATION Traditional bioinformatics methods scan primary sequences for local patterns. It is important to assess how accurate local primary sequence methods can be. RESULTS We study the problem of donor pre-mRNA splice site recognition, where the sequence overlaps between real and decoy datasets can be quantified, exposing the intrinsic limitations of the performance of local primary sequence methods. We assess the accuracy of primary sequence methods generally by studying how they scale with dataset size and demonstrate that our new primary sequence ranking methods have superior performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local RNA structure alignment with incomplete sequence

MOTIVATION Accuracy of automated structural RNA alignment is improved by using models that consider not only primary sequence but also secondary structure information. However, current RNA structural alignment approaches tend to perform poorly on incomplete sequence fragments, such as single reads from metagenomic environmental surveys, because nucleotides that are expected to be base paired ar...

متن کامل

SpliceTrap: a method to quantify alternative splicing under single cellular conditions

MOTIVATION Alternative splicing (AS) is a pre-mRNA maturation process leading to the expression of multiple mRNA variants from the same primary transcript. More than 90% of human genes are expressed via AS. Therefore, quantifying the inclusion level of every exon is crucial for generating accurate transcriptomic maps and studying the regulation of AS. RESULTS Here we introduce SpliceTrap, a m...

متن کامل

QOMA: quasi-optimal multiple alignment of protein sequences

MOTIVATION We consider the problem of multiple alignment of protein sequences with the goal of achieving a large SP (Sum-of-Pairs) score. RESULTS We introduce a new graph-based method. We name our method QOMA (Quasi-Optimal Multiple Alignment). QOMA starts with an initial alignment. It represents this alignment using a K-partite graph. It then improves the SP score of the initial alignment th...

متن کامل

Protein secondary structure: entropy, correlations and prediction.

MOTIVATION Is protein secondary structure primarily determined by local interactions between residues closely spaced along the amino acid backbone or by non-local tertiary interactions? To answer this question, we measure the entropy densities of primary and secondary structure sequences, and the local inter-sequence mutual information density. RESULTS We find that the important inter-sequenc...

متن کامل

Relation between weight matrix and substitution matrix: motif search by similarity

MOTIVATION The discovery of patterns shared by several sequences that differ greatly is a basic task in sequence analysis, and still a challenge. Several methods have been developed for detecting patterns. Methods commonly used for motif search include the Gibbs sampler, Expectation-Maximization (EM) algorithm and some intuitive greedy approaches. One cannot guarantee the optimality of the resu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 16  شماره 

صفحات  -

تاریخ انتشار 2005